{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Julia for Data Science\n",
    "\n",
    "* Data\n",
    "* Data processing\n",
    "* **Visualization**\n",
    "\n",
    "### Data visualization: generating nice looking plots in Julia is straight forward\n",
    "In what's next, we will see some of the tools that Julia plotting provides to produce high quality figures for your data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1: plot math (specifically latex equations) in our plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using LaTeXStrings\n",
    "using Plots\n",
    "pyplot()\n",
    "x = 1:0.2:4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's declare some variables that store the functions we want to plot written in LaTex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x2 = L\"x^2\"\n",
    "logx = L\"log(x)\"\n",
    "sqrtx = L\"\\sqrt{x}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create three functions and plot them all!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y1 = sqrt.(x)\n",
    "y2 = log.(x)\n",
    "y3 = x.^2\n",
    "\n",
    "f1 = plot(x,y1,legend = false)\n",
    "plot!(f1, x,y2) # \"plot!\" means \"plot on the same canvas we just plotted on\"\n",
    "plot!(f1, x,y3)\n",
    "title!(\"Plot $x2 vs. $logx vs. $sqrtx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can annotate each of these curves using either text, or latex strings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "annotate!(f1,[(x[6],y1[6],text(sqrtx,16,:center)),\n",
    "          (x[11],y2[11],text(logx,:right,16)),\n",
    "          (x[6],y3[6],text(x2,16))])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yay! Now you can convince a little child that x^2 grows much faster than sqrt(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: Stat Plots. \n",
    "\n",
    "2D histograms are really easy!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 1000\n",
    "set1 = randn(n)\n",
    "set2 = randn(n)\n",
    "histogram2d(set1,set2,nbins=20,colorbar=true)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's go back to our houses dataset and learn even more things about it!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using DataFrames\n",
    "houses = readtable(\"houses.csv\")\n",
    "filter_houses = houses[houses[:sq__ft].>0,:]\n",
    "x = filter_houses[:sq__ft]\n",
    "y = filter_houses[:price]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gh = histogram2d(x,y,nbins=20,colorbar=true)\n",
    "xaxis!(gh,\"square feet\")\n",
    "yaxis!(gh,\"price\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Interesting! \n",
    "\n",
    "Most houses sold are in the range 1000-1500 and they cost approximately 150,000 dollars"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Let's see more stats plots.*\n",
    "\n",
    "We can convince ourselves that random distrubutions are indeed very similar.\n",
    "\n",
    "Let's do that through **box plots** and **violin plots**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using StatPlots\n",
    "y = rand(10000,6) # generate 6 random samples of size 1000 each\n",
    "f2 = violin([\"Series 1\" \"Series 2\" \"Series 3\" \"Series 4\" \"Series 5\"],y,leg=false,color=:red)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "boxplot!([\"Series 1\" \"Series 2\" \"Series 3\" \"Series 4\" \"Series 5\"],y,leg=false,color=:green)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These plots look almost identical, so we do have the same distribution indeed.\n",
    "\n",
    "Let's study the price distributions in different cities in the houses dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "some_cities = [\"SACRAMENTO\",\"RANCHO CORDOVA\",\"RIO LINDA\",\"CITRUS HEIGHTS\",\"NORTH HIGHLANDS\",\"ANTELOPE\",\"ELK GROVE\",\"ELVERTA\" ] # try picking pther cities!\n",
    "\n",
    "fh = plot(xrotation=90)\n",
    "for ucity in some_cities\n",
    "    subs = filter_houses[filter_houses[:city].==ucity,:]\n",
    "    city_prices = subs[:price]\n",
    "    violin!(fh,[ucity],city_prices,leg=false)\n",
    "end\n",
    "display(fh)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 3: Subplots are very easy!\n",
    "\n",
    "You can create your own layout as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mylayout = @layout([a{0.5h};[b{0.7w} c]])\n",
    "plot(fh,f2,gh,layout=mylayout,legend=false)\n",
    "\n",
    "# this layout:\n",
    "# 1 \n",
    "# 2 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 4: Bonus plot, XKCD kind of plots with PyPlot\n",
    "\n",
    "Let's load `PyPlot` and create some data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using PyPlot\n",
    "x = 1:100;\n",
    "y = vcat(ones(Int,20),1:10,10*ones(70));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xkcd()\n",
    "fig = figure()\n",
    "ax = axes()\n",
    "p = PyPlot.plot(x,y)\n",
    "annotate(\"some \\n caption\",xy=[30;10],arrowprops=Dict(\"arrowstyle\"=>\"->\"),xytext=[40;7])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Modify the plot parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xticks([])\n",
    "yticks([])\n",
    "xlabel(\"index\")\n",
    "ylabel(\"value\")\n",
    "title(\"our first xkcd plot\")\n",
    "\n",
    "ax[:spines][\"top\"][:set_color](\"none\") \n",
    "ax[:spines][\"right\"][:set_color](\"none\") \n",
    "ax[:spines][\"left\"][:set_color](\"none\") \n",
    "ax[:spines][\"bottom\"][:set_color](\"none\") \n",
    "\n",
    "fig[:canvas][:draw]()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And finally, display the figure!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(fig)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Please let us know how we're doing!\n",
    "\n",
    "https://tinyurl.com/JuliaDataScience"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.6.2",
   "language": "julia",
   "name": "julia-0.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
